Scikit-Learn 패키지의 소개

Scikit-Learn 패키지는 머신 러닝 교육 및 실무를 위한 파이썬 패키지로 다음과 같은 구성 요소들을 갖추고 있다.

벤치마크용 샘플 데이터 세트
데이터 전처리(preprocessing) 기능
Supervised learning
Unsupervised learning
모형 평가 및 선택

자세한 내용은 다음 웹사이트를 참조한다.

http://scikit-learn.org

scikit-learn 패키지에서 제공하는 머신 러닝 모형

scikit-learn 패키지의 장점은 다양한 머신 러닝 모형 즉, 알고리즘을 하나의 패키지에서 모두 제공하고 있다는 점이다. 다음은 scikit-learn 패키지에서 제공하는 머신 러닝 모형의 목록이다. 이 목록은 대표적인 것들만을 나열한 것이며 지속적으로 모형들이 추가되고 있다.

Supervised learning

http://scikit-learn.org/stable/supervised_learning.html
Generalized Linear Models
- Ordinary Least Squares
- Ridge/Lasso/Elastic Net Regression
- Logistic regression
- Polynomial regression
- Perceptron
Linear and Quadratic Discriminant Analysis
Support Vector Machines
Stochastic Gradient Descent
Nearest Neighbor Algorithms
Gaussian Processes
Naive Bayes
Decision Trees
Ensemble methods
- Random Forests
- AdaBoost

Unsupervised learning

http://scikit-learn.org/stable/unsupervised_learning.html

Gaussian mixture models
Manifold learning
Clustering
- K-means
- DBSCAN
Biclustering
Decomposing
- Principal component analysis (PCA)
- Factor Analysis
- Independent component analysis (ICA)
- Latent Dirichlet Allocation (LDA)
Covariance estimation
Novelty and Outlier Detection
Density Estimation

scikit-learn의 서브 패키지

scikit-learn 은 서브 패키지 단위로 별도의 기능을 제공하고 있다. 대표적인 서브 패키지와 기능을 나열하면 다음과 같다.

자료 제공:
- sklearn.datasets: 샘플 데이터 세트 제공

자료 전처리:
- sklearn.preprocessing: imputation, encoding 등 단순 전처리
- sklearn.feature_extraction: Feature Extraction

모형:
- sklearn.base: Base classes and utility functions
- sklearn.pipeline: Pipeline
- sklearn.linear_model: Generalized Linear Models
- sklearn.naive_bayes: Naive Bayes
- sklearn.discriminant_analysis: Discriminant Analysis
- sklearn.neighbors: Nearest Neighbors
- sklearn.mixture: Gaussian Mixture Models
- sklearn.svm: Support Vector Machines
- sklearn.tree: Decision Trees
- sklearn.ensemble: Ensemble Methods
- sklearn.cluster: Clustering

모형 평가:
- sklearn.metrics: Metrics
- sklearn.cross_validation: Cross Validation
- sklearn.grid_search: Grid Search

scikit-learn의 Class

scikit-learn을 사용하기 위해서는 원하는 기능을 가지고 있는 클래스 객체를 생성해야 한다. scikit-learn은 다양한 클래스를 제공하지만 대부분의 클래스는 다음과 같이 세가지 그룹으로 나눌수 있다.

전처리용 클래스
- Transformer 클래스
  - 자료 변환
- 공통 메서드
  - fit(): 모형 계수 추정, 트레이닝(training)
  - transform() : 자료 처리
  - fit_transform() : 모형 계수 추정 및 자료 처리 동시 수행

머신러닝 모형 클래스 그룹
- Regressor 클래스
  - 회귀분석
- Classifier 클래스
  - 분류
- Cluster 클래스
  - 클러스터링
- 공통 메서드
  - fit(): 모형 계수 추정, 트레이닝(training)
  - predict(): 주어진 x값에 대해 y 예측
  - score(): 성과 분석

Pipeline 클래스
- 복수의 Preprocessor와 Model을 연결하여 하나의 Model처럼 행동
- Model 클래스가 제공하는 공통 메서드를 모두 제공
- pipeline 내부에서 Preprocessor에서 자료를 계속 변형한 후 마지막으로 Model에 입력